113 research outputs found
CERN: Confidence-Energy Recurrent Network for Group Activity Recognition
This work is about recognizing human activities occurring in videos at
distinct semantic levels, including individual actions, interactions, and group
activities. The recognition is realized using a two-level hierarchy of Long
Short-Term Memory (LSTM) networks, forming a feed-forward deep architecture,
which can be trained end-to-end. In comparison with existing architectures of
LSTMs, we make two key contributions giving the name to our approach as
Confidence-Energy Recurrent Network -- CERN. First, instead of using the common
softmax layer for prediction, we specify a novel energy layer (EL) for
estimating the energy of our predictions. Second, rather than finding the
common minimum-energy class assignment, which may be numerically unstable under
uncertainty, we specify that the EL additionally computes the p-values of the
solutions, and in this way estimates the most confident energy minimum. The
evaluation on the Collective Activity and Volleyball datasets demonstrates: (i)
advantages of our two contributions relative to the common softmax and
energy-minimization formulations and (ii) a superior performance relative to
the state-of-the-art approaches.Comment: Accepted to IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 201
Recommended from our members
SLEDGE: Sequential Labeling of Image Edges for Boundary Detection
Our goal is to detect boundaries of objects or surfaces
occurring in an arbitrary image. We present a new approach
that discovers boundaries by sequential labeling of
a given set of image edges. A visited edge is labeled as
on or off a boundary, based on the edge’s photometric and
geometric properties, and evidence of its perceptual grouping
with already identified boundaries. We use both local
Gestalt cues (e.g., proximity and good continuation), and
the global Helmholtz principle of non-accidental grouping.
A new formulation of the Helmholtz principle is specified
as the entropy of a layout of image edges. For boundary
discovery, we formulate a new, policy iteration algorithm,
called SLEDGE. Training of SLEDGE is iterative. In each
training image, SLEDGE labels a sequence of edges, which
induces loss with respect to the ground truth. These sequences
are then used as training examples for learning
SLEDGE in the next iteration, such that the total loss is
minimized. For extracting image edges that are input to
SLEDGE, we use our new, low-level detector. It finds salient
pixel sequences that separate distinct textures within the image.
On the benchmark Berkeley Segmentation Datasets
300 and 500, our approach proves robust and effective. We
outperform the state of the art both in recall and precision
for different input sets of image edges
- …